5 research outputs found

    Quantifying OpenMP: Statistical Insights into Usage and Adoption

    Full text link
    In high-performance computing (HPC), the demand for efficient parallel programming models has grown dramatically since the end of Dennard Scaling and the subsequent move to multi-core CPUs. OpenMP stands out as a popular choice due to its simplicity and portability, offering a directive-driven approach for shared-memory parallel programming. Despite its wide adoption, however, there is a lack of comprehensive data on the actual usage of OpenMP constructs, hindering unbiased insights into its popularity and evolution. This paper presents a statistical analysis of OpenMP usage and adoption trends based on a novel and extensive database, HPCORPUS, compiled from GitHub repositories containing C, C++, and Fortran code. The results reveal that OpenMP is the dominant parallel programming model, accounting for 45% of all analyzed parallel APIs. Furthermore, it has demonstrated steady and continuous growth in popularity over the past decade. Analyzing specific OpenMP constructs, the study provides in-depth insights into their usage patterns and preferences across the three languages. Notably, we found that while OpenMP has a strong "common core" of constructs in common usage (while the rest of the API is less used), there are new adoption trends as well, such as simd and target directives for accelerated computing and task for irregular parallelism. Overall, this study sheds light on OpenMP's significance in HPC applications and provides valuable data for researchers and practitioners. It showcases OpenMP's versatility, evolving adoption, and relevance in contemporary parallel programming, underlining its continued role in HPC applications and beyond. These statistical insights are essential for making informed decisions about parallelization strategies and provide a foundation for further advancements in parallel programming models and techniques

    Scope is all you need: Transforming LLMs for HPC Code

    Full text link
    With easier access to powerful compute resources, there is a growing trend in the field of AI for software development to develop larger and larger language models (LLMs) to address a variety of programming tasks. Even LLMs applied to tasks from the high-performance computing (HPC) domain are huge in size (e.g., billions of parameters) and demand expensive compute resources for training. We found this design choice confusing - why do we need large LLMs trained on natural languages and programming languages unrelated to HPC for HPC-specific tasks? In this line of work, we aim to question design choices made by existing LLMs by developing smaller LLMs for specific domains - we call them domain-specific LLMs. Specifically, we start off with HPC as a domain and propose a novel tokenizer named Tokompiler, designed specifically for preprocessing code in HPC and compilation-centric tasks. Tokompiler leverages knowledge of language primitives to generate language-oriented tokens, providing a context-aware understanding of code structure while avoiding human semantics attributed to code structures completely. We applied Tokompiler to pre-train two state-of-the-art models, SPT-Code and Polycoder, for a Fortran code corpus mined from GitHub. We evaluate the performance of these models against the conventional LLMs. Results demonstrate that Tokompiler significantly enhances code completion accuracy and semantic understanding compared to traditional tokenizers in normalized-perplexity tests, down to ~1 perplexity score. This research opens avenues for further advancements in domain-specific LLMs, catering to the unique demands of HPC and compilation tasks

    MPI-rical: Data-Driven MPI Distributed Parallelism Assistance with Transformers

    Full text link
    Automatic source-to-source parallelization of serial code for shared and distributed memory systems is a challenging task in high-performance computing. While many attempts were made to translate serial code into parallel code for a shared memory environment (usually using OpenMP), none has managed to do so for a distributed memory environment. In this paper, we propose a novel approach, called MPI-rical, for automated MPI code generation using a transformer-based model trained on approximately 25,000 serial code snippets and their corresponding parallelized MPI code out of more than 50,000 code snippets in our corpus (MPICodeCorpus). To evaluate the performance of the model, we first break down the serial code to MPI-based parallel code translation problem into two sub-problems and develop two research objectives: code completion defined as given a location in the source code, predict the MPI function for that location, and code translation defined as predicting an MPI function as well as its location in the source code. We evaluate MPI-rical on MPICodeCorpus dataset and on real-world scientific code benchmarks and compare its performance between the code completion and translation tasks. Our experimental results show that while MPI-rical performs better on the code completion task than the code translation task, the latter is better suited for real-world programming assistance, in which the tool suggests the need for an MPI function regardless of prior knowledge. Overall, our approach represents a significant step forward in automating the parallelization of serial code for distributed memory systems, which can save valuable time and resources for software developers and researchers. The source code used in this work, as well as other relevant sources, are available at: https://github.com/Scientific-Computing-Lab-NRCN/MPI-rica

    Split chloramphenicol acetyl-transferase assay reveals self-ubiquitylation-dependent regulation of UBE3B

    No full text
    Split reporter protein-based genetic section systems are widely used to identify and characterize protein-protein interactions (PPI). Assembly of split markers that antagonize toxins, rather than required for synthesis of missing essential metabolites, facilitate the seeding of high density of cells and selective growth. Here we present adeveloped split chloramphenicol acetyltransferase (split-CAT) -based genetic selection system. The N-terminus fragment of CAT is fused downstream of the protein of interest and the C-terminus fragment is tethered upstream of a postulated protein partner. We demonstrate the system's advantages for the study of PPIs. Moreover, we show that co-expression of a functional ubiquitylation cascade where the target and ubiquitin are tethered to the split-CAT fragments results in ubiquitylation-dependent growth on selective media. The fact that proteins do not have to be purified from bacteria and the high sensitivity of the split-CAT reporter, enable the detection of challenging protein cascades and post-translation modifications. In addition, we demonstrate that the split- CAT system responds to small molecule inhibitors and molecular glues (GLUTACs). The absence of ubiquitylation-dependent degradation and deubiquitylation in E. coli significantly simplify the interpretation of the results. We demonstrate that the spit-CAT system provides a readout for the known self-ubiquitylation-dependent inactivation of NEDD4. Subsequently, we harnessed the system to explore if UBE3B, a HECT ligase not belonging to the Nedd4 subfamily, is also regulated by self-ubiquitylation. We found that self-ubiquitylation of UBE3B at residue K665 inactivates the enzyme in the E. coli system and in mammalian cells due to its oligomerization
    corecore